Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals
نویسندگان
چکیده
Maximum Entropy (MaxEnt) language models are powerful models that can incorporate linguistic and non-linguistic contextual signals in a unified framework with a convex loss. MaxEnt models also have the advantage of scaling to large model and training data sizes We present the following two contributions to MaxEnt training: (1) By leveraging smaller amounts of transcribed data, we demonstrate that a MaxEnt LM trained on various types of corpora can be easily adapted to better match the test distribution of Automatic Speech Recognition (ASR); (2) A novel adaptive-training approach that efficiently models multiple types of non-linguistic features in a universal model. We evaluate the impact of these approaches on Google’s state-of-the-art ASR for the task of voice-search transcription and dictation. Training 10B parameter models utilizing a corpus of up to 1T words, we show large reductions in word error rate from adaptation across multiple languages. Also, human evaluations show significant improvements on a wide range of domains from using non-linguistic features. For example, adapting to geographical domains (e.g., US States and cities) affects about 4% of test utterances, with 2:1 win to loss ratio.
منابع مشابه
Incorporating Cognitive Linguistic Insights into Classrooms: the Case of Iranian Learners’ Acquisition of If-Clauses
Cognitive linguistics gives the most inclusive, consistent description of how language is organized, used and learned to date. Cognitive linguistics contains a great number of concepts that are useful to second language learners. If-clauses in English, on the other hand, remain intriguing for foreign language learners to struggle with, due to their intrinsic intricacies. EFL grammar books are ...
متن کاملFeature-rich sub-lexical language models using a maximum entropy approach for German LVCSR
German is a morphologically rich language having a high degree of word inflections, derivations and compounding. This leads to high out-of-vocabulary (OOV) rates and poor language model (LM) probabilities in the large vocabulary continuous speech recognition (LVCSR) systems. One of the main challenges in the German LVCSR is the recognition of the OOV words. For this purpose, data-driven morphem...
متن کاملLife-wise Language Learning Textbooks: Construction and Validation of an Emotional Abilities Scale through Rasch Modeling
Underlying the recently developed notions of applied ELT and life syllabus is the idea that language classes should give precedence to learners’ life qualities, for instance emotional intelligence (EI), over and above their language skills. By so doing, ELT is ascribed an autonomous status and ELT classes can lavish their full potentials to the learners. With that in mind, this study aimed to d...
متن کاملIncorporating Linguistic Knowledge for Learning Distributed Word Representations
Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge...
متن کاملBackoff inspired features for maximum entropy language models
Maximum Entropy (MaxEnt) language models [1, 2] are linear models that are typically regularized via well-known L1 or L2 terms in the likelihood objective, hence avoiding the need for the kinds of backoff or mixture weights used in smoothed ngram language models using Katz backoff [3] and similar techniques. Even though backoff cost is not required to regularize the model, we investigate the us...
متن کامل